Shaping in Practice: Training Wheels to Learn Fast Hopping Directly in Hardware
نویسندگان
چکیده
Learning instead of designing robot controllers can greatly reduce engineering effort required, while also emphasizing robustness. Despite considerable progress in simulation, applying learning directly in hardware is still challenging, in part due to the necessity to explore potentially unstable parameters. We explore the concept of shaping the reward landscape with training wheels; temporary modifications of the physical hardware that facilitate learning. We demonstrate the concept with a robot leg mounted on a boom learning to hop fast. This proof of concept embodies typical challenges such as instability and contact, while being simple enough to empirically map out and visualize the reward landscape. Based on our results we propose three criteria for designing effective training wheels for learning in robotics. A video synopsis can be found at https://youtu.be/6iH5E3LrYh8.
منابع مشابه
Shaping in Reinforcement Learning by Changing the Physics of the Problem
Children learn to ride a bicycle by using training wheels. They are actually trying to learn one task (riding without training wheels) by training another one. In general, solving a difficult problem can be facilitated by training other problems. This is the basic idea of shaping. It is essential to ensure that spending time on the modified task will help solving the original one. In this paper...
متن کاملComparison of the Effect of 6 Weeks of Balancing and Hopping Strengthening Training on the Kinematics of the Lower Extremities of Athletes with Functional Ankle Instability while Running: A Randomized Controlled Trial
Introduction: Ankle sprains are one of the most common sports injuries. This injury can affect the kinematics of the athletechr('39')s lower extremities. Therefore, the aim of this study was to compare the effect of 6 weeks of balancing and hopping strengthening training on the kinematics of the lower extremities of athletes with functional ankle instability while running. Methods: The present...
متن کاملROCK∗ - Efficient black-box optimization for policy learning
Robotic learning on real hardware requires an efficient algorithm which minimizes the number of trials needed to learn an optimal policy. Prolonged use of hardware causes wear and tear on the system and demands more attention from an operator. To this end, we present a novel black-box optimization algorithm, Reward Optimization with Compact Kernels and fast natural gradient regression (ROCK). O...
متن کاملEffect of six week hopping exercise on time to stabilization and perceived instability of athlete with chronic ankle instability during single leg jump landing
Introduction: The purpose of this study was to examine the effect of 6 weeks hopping exercises program on time to stabilization and perceived stability in athletes with chronic ankle instability Methods: twenty-eight basketball player with chronic ankle instability (mean ± SD age;22.67±2.88 years, mean ± SD weight; 80.47±8.48 kg, mean ± SD height; 186.82±3.09 cm) were participated in this...
متن کاملSafe Online Learning Using Barrier Functions
We present a method for guaranteeing the safety of online learning schemes. The method uses barrier certificates and Sums-of-Squares programming to find a safe region of state space and a controller which renders that space positively invariant. This safe set and controller are then used to create ”training wheels”, which can be added to any controller to create a guaranteed safe controller. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1709.10273 شماره
صفحات -
تاریخ انتشار 2017